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Abstract 


Recent  studies  have  provided  quantitative  information  relating  to  the  very  high 
cost  of  dead  time  (time  that  sailors  are  not  undergoing  training  although  assigned  for 
training)  in  the  Navy  training  system.  These  studies  are  based  upon  quarterly  and 
monthly  average  on  board  (AOB)  data,  which  provided  the  period  averages  for  numerous 
categories  of  dead  time  and  non-dead  time.  Data  of  this  type  are  readily  accessible.  It  has 
been  suggested  that  a  different  data  structure  (i.e.,  longitudinal  data,  which  records  the 
time  spent  by  sailors  in  the  various  categories  measured  from  the  beginning  of  the 
courses),  would  provide  sharper  information  about  what  is  happening  and  help  to  better 
understand  the  nature  of  the  problems,  their  relative  importance,  and  suggest  types  of 
remedial  action. 

The  present  report  discusses  the  data  needs  for  this  type  of  study  and  draws 
attention  to  the  data  acquisition  problems.  A  limited  amount  of  longitudinal  data  was 
acquired  for  selected  courses  and  years.  Models  were  constructed  describing  the  number 
of  days  from  entering  a  course  until  either  academic  attrition,  academic  setback,  or 
interrupted  instruction  of  the  non-legal  holiday  type.  A  distance  measure  was  developed 
for  deciding  the  separation  of  one  model  fit  from  another.  Its  use  shows  that  there  is 
considerable  variability  of  these  distributions  from  course  to  course  and  from  year  to  year. 

Also  considered  are  the  data  needs  for  the  longitudinal  study  of  the  downtimes 
between  courses  in  a  pipeline  of  courses. 

Keywords:  Training,  Setbacks,  Attritions,  Non  homogeneous  Poisson  Processes 
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1.  Introduction 

The  costs  associated  with  training  dead  time  have  attracted  an  increasing  amount  of 
attention,  [Rhoades,  1 999].  Dead  time  in  naval  training  schools  refers  to  man-days  lost  when 
sailors  assigned  to  schools  are  not  in  an  instructional  mode.  There  are  many  reasons  for  this 
condition.  The  broad  categories  of  dead  time  are  awaiting  instruction  (AI),  interrupted 
instruction  (II),  and  awaiting  transfer  (AT).  The  first,  AI,  is  caused  largely  by  sailors  arriving 
for  instruction  prior  to  the  convening  of  the  course  and/or  the  condition  that  space  in  the 
classroom  is  not  available.  The  second,  II,  reflects  a  large  number  of  seemingly  random 
events  that  take  students  out  of  the  classroom.  This  includes  the  legal  holidays  and  these 
appear  to  be  the  more  prominent  contributors,  although  they  are  scheduled  rather  than 
random.  The  third,  AT,  often  reflects  glitches  in  the  cutting  of  orders  and  the  budgeting  of 
PCS  (permanent  change  of  station)  funds.  These  major  forms  have  received  much  attention 
(see  references).  A  nice  description  of  the  flow  can  be  found  in  [Belcher,  et  al.,  1999] 

The  data  structures  used  in  the  cited  studies  of  dead  time  are  gathered  at  fixed  points 
in  time.  Quarterly  data  are  readily  accessible  from  the  Navy  Integrated  Training  Resources 
and  Administration  System  (NITRAS),  but  monthly  data  can  be  obtained  upon  request.  The 
values  are  “average  on  board”  (AOB)  for  the  period;  that  is,  the  time  average  of  the  personnel 
count  in  a  particular  category  for  the  period  of  time.  The  inferences  are  based  upon  these. 
Personnel  in  Manpower,  Personnel,  Training  and  Education  (N813)  have  suggested  that  there 
may  be  important  complementary  information  in  the  “holding  time”  distributions  that 
measure  the  number  of  days  that  students  stay  in  a  specific  state  (category)  prior  to  changing 
to  another  state.  Such  distributions  are  commonly  called  longitudinal. 

There  are  two  main  kinds  of  states:  Under  Instruction  (UI)  and  Not  Under  Instruction 
(NUI).  The  former  is  the  preferred  state  for  all  sailors  associated  with  a  training  status.  The 
latter  is  the  all-inclusive  dead  time  state  and  contains,  of  course,  AI,  II,  and  AT  as  sub-states. 
It  is  important  to  reduce  the  holding  times  in  these  latter  states.  The  author  has  been  asked 
to  look  into  the  distributions  of  holding  times.  The  goal  is  to  identify  explanatory  variables, 
be  they  courses,  seasons  or  policies  that  promote  uptime  (UI)  and  diminish  down  time  (NUI). 

The  progress  has  been  modest.  The  acquisition  of  appropriate  data  is  difficult;  the 
databases  are  not  organized  for  direct  access  to  such  distributions.  Some  models  for  certain 
kinds  of  uptime  have  been  developed  and  tested.  The  successful  ones  are  coarse  in  nature. 
The  proper  data  requirements  are  not  yet  fully  developed.  The  present  report  documents  the 
issues  and  clarifies  the  processes  involved.  Some  modeling  for  the  random  times  (i.e.,  due  to 
attrition,  setbacks  and  non-  holiday  II),  until  entry  into  an  NUI  state  from  an  UI  state  has 
taken  place.  These  are  presented  and  tested  in  the  report.  The  levels  of  success  are  mixed. 

Following  this  introduction  is  a  section  on  background  that  provides  some 
perspective  for  the  work  and  discusses  some  relevant  issues.  Section  3  contains  descriptions 
of  the  data  acquired  for  the  building  of  the  models  developed  and  treated.  Section  4  reports 
on  the  model  building  and  testing.  The  summary  in  Section  5  includes  an  outline  of  the  data 
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structures  needed  to  pursue  these  issues  properly.  Compilation  of  details  and  other  support 
are  in  the  Appendices. 
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2.  Background 

The  Navy  operates  many  schools.  A  main  goal  of  the  training  system  is  to  place 
appropriately  qualified  sailors  into  the  fleet  in  a  timely  fashion  and  in  the  proper  numbers. 
The  planning  models  no  doubt  provide  for  a  cushion  of  reserve  in  time  and  personnel,  but 
such  planning  does  not  always  result  in  full  staffing  and  the  resultant  shortfall  is  certainly 
always  expensive.  The  extent  of  the  problem  is  well  covered  in  the  references. 

There  is  a  basic  awkwardness  ir.  planning  for  new  recruits  to  get  into  boot  camp.  The 
recruiting  system  allows  for  remarkable  flexibility  for  recruits  in  terms  of  entrance  times  and 
choice  of  skill  schools.  There  are  many  delays  charged  to  AI  because  of  the  timing 
mismatches  and  to  the  over-subscription  problems,  i.e.,  more  students  than  classroom  seats. 
Of  course,  there  are  costs  associated  10  under  subscription  as  well.  The  awkwardness  is 
exacerbated  because  remedial  action  would  involve  both  the  recruiting  commands  and  the 
training  commands.  Other  forms  of  /J  involve  transition  from  one  school  to  the  next,  and 
the  delays  associated  with  finding  t  seat  when  there  is  a  setback,  i.e.,  either  an  under 
achieving  student  being  moved  to  a  different  section  of  the  same  course  which  had  a  later 
convening  date  or  being  placed  in  a  prerequisite  course  for  remedial  work. 

The  AT  category  of  NUI  also  nvolves  liaison  with  other  commands.  The  main  items 
here  are  the  cutting  of  orders  and  but  geting  of  Permanent  Change  of  Station  (PCS)  monies. 
Again,  the  retrieval  of  holding  time  data  is  difficult. 

The  most  conspicuous  cause  in  the  II  category  of  NUI  is  that  of  legal  holidays.  These 
are  easily  anticipated,  and  it  seems  unlikely  that  administrative  action  will  be  taken  to  give 
relief  to  this  source.  The  number  of  days  lost  due  to  this  source  should  have  low  variability. 
Other  forms  of  II  occur  at  random  limes  and  may  be  treated  statistically. 

One  might  view  the  system  as  an  alternating  renewal  process.  A  sailor’s  sojourn  in 
the  schools  could  be  marked  as  “up”  when  he  is  in  the  UI  state,  and  “down”  when  in  the  NUT 
state.  The  holding  times  in  the  down  state  are  likely  to  have  multi-modal  distributions.  They 
are  influenced  by  the  reason  for  entering  the  down  state  and  the  accounting  rules  for  the  type 
of  down  state  entered.  For  example,  when  a  sailor  graduates  from  a  class,  the  graduation 
date  is  fixed  and  capable  of  being  anticipated.  Suppose  a  change  of  location  is  required.  He 
enters  the  AT  version  of  NUI  and  it  seems  that  the  holding  time  in  this  state  should  be 
deterministic  or  have  a  low  variance.  Further,  the  follow-on  school  and  its  starting  time 
characteristics  are  known  and  can  be  planned  upon.  At  some  point,  one  would  expect  a 
transfer  to  the  AI  sub-state,  but  tire  rules  for  this  change  may  not  be  standardized.  The  main 
point  is  that  the  successful  students  who  are  unhindered  by  random  forms  of  disturbances  can 
flow  through  a  pipeline  of  schools  in  a  well-planned  way  (i.e.,  essentially  deterministic). 
Unfortunately,  the  data  requirements  for  studying  these  flows  have  not  been  delineated  in  a 
structured  way.  That  is,  the  students  are  commodities  that  flow  through  a  network  of  many 
paths.  The  paths  must  be  partitioned  into  sets,  often  called  pipelines,  and  every  sailor  is 
assigned  a  particular  pipeline.  There  is  some,  but  a  small  amount  of,  lateral  transferability 


4 


across  the  pipelines.  There  is  sharing  of  courses  in  the  early  part  of  the  network  structure, 
but  afterwards  there  are  a  large  number  of  small  flows  from  one  school  to  several  specified 
next  schools,  and  the  dispersion  dilutes  the  numbers  of  sailors.  It  appears  that  the  retrieval  of 
this  type  of  data  must  be  generated  person  by  person.  Unless  such  distributions  have  useful 
stability,  standard  Renewal  Theory  models  are  not  likely  to  be  appropriate.  Some  interesting 
related  network  flow  models  have  been  introduced,  [Lawphongpanich  and  Brown,  2000]. 
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3.  Description  of  Data 


The  personnel  in  charge  of  the  NITRAS  database  are  very  cooperative.  However, 
specialized  data  requests  take  time  and  it  is  not  always  possible  to  obtain  exactly  what  we 
want.  We  decided  to  identify  about  two-dozen  prominent  courses,  by  Course  Identification 
Number  (CIN),  in  terms  of  total  dead  time  and  seek  longitudinal  data  for  each.  The  courses 
having  the  more  complete  data  are  listed.  The  Course  Data  Processing  Code  (CDP)  is  also 
marked.  (It  can  identify  course  location  information,  whereas  the  CIN  cannot.)  We  acquired 
information  on  them  from  1996  through  1999. 


CIN 

CDP 

CIN 

CDP 

CIN 

CDP 

A-800-0013 

0133 

A-623-0125 

622N 

B-330-0010 

3257 

A-202-0014 

6668 

A-730-0010 

61 9D 

C-602-2039 

625U 

A- 100-0 139 

622L 

A-661-0010 

333K 

C-222-2010 

619K 

A-041-0010 

6400 

A-661-0103 

333L 

P-500-0047 

253L 

A-500-0014 

6102 

A-43 1-0069 

0519 

C-622-2010 

619K 

A- 100-0138 

6672 

A-651-0119 

618J 

C- 100-20 18 

642Z 

A-202-0014 

6668 

A-651-01 18 

617V 

C-l  00-2020 

625B 

A-652-0298 

6609 

Table  1.  Courses  with  the  more  complete  data. 

Initially,  the  basic  categories,  AI,  n,  AT,  are  marked  as  to  reason  (i.e.,  administrative, 
legal,  medical,  and  other)  with,  in  the  case  of  AI,  on  board  prior  to  convening  as  well.  We 
are  also  interested  in  setbacks  and  attritions.  They  are  less  readily  anticipated. 


It  was  determined  that  some  important  holding-time  data  can  be  acquired  without 
sorting  through  the  individual  social  security  numbers.  The  courses  can  be  accessed  from  the 
time  point  of  their  convening.  The  day-by-day  events  are  recorded.  It  was  decided  to 
concentrate  on  the  holding  times  from  the  course  beginning  until  academic  setback,  as; 
academic  attrition,  aa;  and  interrupted  instruction,  ii  (for  reasons  other  than  recognized 
holidays).  At  these  epochs,  the  cited  numbers  may  enter  substates:  presumably  the  aa’s  go  to 
AT,  the  as’s  to  AI  and  the  ii’s  to  II.  It  is  not  clear  how  this  type  of  II  differs  from  AI.  We  do 
know  that  an  expense  is  incurred  when  students  leave  the  UI  state.  It  is  useful  for  the  planner 
to  know  how  many  by  course  and  by  type  (as,  aa,  ii),  and  how  deeply  into  the  course  the 
student  has  progressed  when  this  change  of  state  happens.  It  might  be  expected  that  the  ii 
variable  is  distributed  uniformly  over  time.  But  the  data  does  not  show  this.  Moreover,  it 
may  occur  that  a  particular  sailor  may  experience  several  IPs  during  a  single  course.  The 
setbacks  and  attritions  may  be  related  to  the  portions  of  the  course  attempted.  If  so,  it  seems 
that  the  course  administrators  have  three  options:  redesign  the  affected  portions  of  the 
course,  review  the  admission  requirements  for  the  course,  or  continue  to  provide  for  the 
expense  of  placing  the  student  into  an  NUI  state. 


Accordingly,  we  proceed  to  model  these  processes.  They  may  be  useful  in 
determining  if  these  distributions  are  stable  from  year  to  year,  how  dependent  are  they  upon 
the  particular  course,  and  does  the  length  of  a  course  present  an  important  effect. 
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4.  Modeling 

For  each  course,  the  number  of  students  leaving  the  UI  state,  {Yt }  for  t  =  1, 2,  ••• ,  n 
where  n  is  the  length  of  the  course  in  days,  is  a  non-homogeneous  Poisson  process  with  mean 
value  function  {X,t} .  The  modeling  process  involves  finding  a  description  of  the  {Xt}  in  terms 
of  a  few  parameters,  testing  the  adequacy  of  the  fit,  and  assessing  the  annual  stability  of  the 
model.  Two  classes  of  models  were  considered:  those  of  the  sigmoid  learning-curve  type, 
and  the  more  general  step-function  type. 

It  was  believed  that  sigmoid  models  such  as  the  logistic  and  Gompertz  curves, 
[Hamilton,  1991]  would  be  successful  for  this  purpose,  but  such  did  not  seem  to  occur  with 
regularity.  We  concocted  our  own  model,  also  of  the  sigmoid  type,  and  had  some  success 

Xt  =  A  exp{-a/t  -  (b-t)c) 

where  A,  a,  b  and  c  are  parameters  to  be  fitted.  This  function  stays  close  to  zero  in  the 
early  part  of  a  course  and  rises  sharply  to  a  single  modal  value.  Then  it  tapers  off  with  a 
long  right  tail.  This  function  captures  the  idea  that  there  is  little  in  the  way  of  attrition 
early  in  a  course,  and  then  things  change  quickly  as  the  early  attritions  bunch  up.  The 
subsequent  tapering  captures  a  reduced  amount  of  attrition  as  the  course  continues  from 
there.  But  success  with  models  of  this  form  was  limited. 

It  was  decided  to  work  with  simple  step  functions.  That  is,  the  sequence  of  days  is 
partitioned  into  k  intervals  and  Xt  is  constant  on  each  member  of  the  partition.  The  result  is  a 
step  function  and,  if  k  is  not  large,  it  can  capture  the  temporal  behavior  of  the  process.  These 
models  are  general,  coarse  and  can  serve  to  point  the  way  to  classes  of  smoother  models. 

The  fitting  process  involves  the  specification  of  k  by  the  user,  and  the  estimation  of 
the  partition  break  points  by  maximum  likelihood.  A  special  algorithm  was  developed  to 
accomplish  this,  and  was  executed  in  a  Master’s  thesis,  [Li,  2000].  The  effect  and  results  of 
using  this  algorithm  are  tabulated  in  Appendix  A.  The  goodness  of  fit  is  judged  by  a 
Chi-square  statistic  with  n-k  degrees  of  freedom. 

This  worked  reasonably  well.  In  fact,  k  =  5  led  to  reasonable  fits  in  many  of  the 
cases.  But  it  did  not  hold  up  for  all.  The  value  k  =  7  serves  for  a  number  of  the  others. 

Turning  to  the  issue  of  temporal  stability,  it  would  be  useful  if  the  annual  models 
could  be  combined  into  a  single  choice  of  partitions  for  a  course.  In  support  of  this  goal,  a 
distance  function  was  developed  in  order  to  measure  the  separations  of  the  annual  functions 
from  the  pooled  data  four-year  mean  value  function.  It  is  described  in  Appendix  B.  The  use 
of  the  pooled  mean  value  function  maybe  tenable,  but  such  use  is  a  judgement  call. 
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5.  Summary  and  Recommendations 


The  present  work  makes  a  beginning  on  the  problem  of  anticipating  the  numbers  of 
sailors  that  enter  a  NUI  state  by  means  of  academic  attrition,  academic  setback,  and 
interrupted  instruction  in  terms  of  how  long  they  have  been  in  the  course.  Such  losses  are 
not  uniformly  distributed  over  the  length  of  the  course.  There  is  a  low  level  early  in  a  course; 
the  point  of  rise  to  more  intense  leaving  activity  is  illusive;  the  use  of  unimodal  models  to 
describe  this  curve  may  be  tenable;  the  behavior  of  the  curve  appears  to  vary  course  by 
course  and  year  by  year.  A  more  careful  study  of  these  processes  would  involve  the  inclusion 
of  the  separation  of  the  course  segments  according  to  their  dates  of  convening  and  the 
enrollment  numbers  of  each.  Then  sharper  modeling  can  be  applied.  Of  course,  more  CIN 
numbers  should  be  included  as  well. 

A  planner  would  need  to  know  the  convening  dates  and  the  enrollment  numbers  for 
the  courses  in  order  to  use  this  type  of  model  effectively.  The  statistical  anticipation  of 
losses  from  these  sources  could  be  used  to  review  the  administrative  aspects  of  the  courses  as 
well  as  for  budgetary  planning. 

The  study  of  the  larger  problem,  that  of  down  time  distributions,  requires  an 
investment  in  defining  the  processes  and  organizing  the  data  sources.  A  first  step  in  the 
longitudinal  analysis  of  the  NUI  time  in  the  pipelines  would  be  to  specify  the  data  needs. 
Presumably  there  are  important  classes  of  well-behaved  pipes  such  as  the  one  in  the  diagram 
on  the  next  page.  This  is  a  schematic  in  which  the  network  is  a  tree.  The  entrance  node  on 
the  left  indicates  the  beginning  of  a  set  of  classes  (e.g.,  boot  camp,  followed  by  graduation 
and  moving  on  to  another  class).  The  solid  lines  mark  the  UI  time  and  the  dashed  lines 
indicate  the  NUI  time  between  courses.  In  a  perfect  world,  all  of  the  terminal  nodes  send 
sailors  to  the  fleet. 

Each  path  in  the  tree  is  a  pipeline.  The  tree  structure  and  the  implied  monotone  time 
scale  as  one  moves  from  left  to  right  indicates  the  nature  of  important  dependencies  between 
pipes.  Thus,  several  pipes  have  common  starting  points  in  time.  It  is  assumed  that  the 
sequencing  of  courses  is  rigid;  that  is,  one  must  complete  a  course  prior  to  enrolling  in  a 
following  course.  A  set  of  pipes  of  such  a  structure  are  superimposed  on  one  another,  but 
with  staggered  starting  times.  One  tree  starts  at  time  to,  the  next  at  >  to,  and  so  on.  The 
time  scales  of  the  two  trees  need  not  be  identical,  but  the  second  and  later  trees  can  provide 
places  to  put  the  setbacks. 
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TimeUI 
Time  NUI 


►  Days 


Figure  1.  A  tree  representation  of  course  pipelines. 

The  analyst  needs  a  set  of  trees,  the  course  numbers  of  all  courses  in  a  pipe  and  their 
capacities,  the  convening  dates,  and  the  enrollment  and  graduation  numbers  for  each  course. 
When  a  change  of  location  is  involved  between  courses  they  should  be  so  marked.  With  data 
of  this  type  one  can  do  the  following:  identify  the  efficient  and  inefficient  pipes;  compute 
losses  at  the  end  of  each  course  and  the  terminal  nodes;  determine  the  holding  time 
distributions;  and  set  priorities  for  the  next  set  of  actions  and  studies. 

It  is  recommended  that: 

•  A  representative  set  of  pipelines  be  identified,  which  are  well  behaved  in 
that  they  process  substantial  numbers  of  sailors  and  reflect  important 
classes  of  end-product  skills. 

•  Databases  and  query  systems  be  generated  so  that  a  researcher  has 
convenient  access  to  the  information  outlined  above:  convening  and 
graduation  dates,  enrollment  and  graduation  numbers,  course  capacities 
and  locations,  dates  of  course  attritions  in  various  categories  and  policies 
for  managing  those  who  do  not  complete  courses  as  scheduled. 
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Appendix  A:  Model  Fitting  Summaries 


The  tables  serve  to  illustrate  the  results  of  model  fitting  and  the  amount  of 
variability.  This  is  done  for  the  temporally  combined  data  and  for  the  individual  years  in 
selected  cases.  The  use  of  five  intervals  in  a  partition  is  often  acceptable,  but  there  are 
courses  and  lost  time  types  for  which  five  is  inadequate  from  a  statistical  point  of  view. 
The  model  fitting  is  by  maximum  likelihood;  the  estimates  are  the  partition  break  points 
and  the  mean  value  rates.  The  number  of  partitions,  k,  is  user  supplied. 

The  individual  years  bear  but  small  resemblance  to  each  other  and  to  the 
combined  years.  Three  types — academic  attrition,  academic  setback,  and  interrupted 
instruction — do  not  appear  to  behave  in  common  patterns. 

Legend,  b.  the  partition  break  points,  days  since  inception, 
p:  the  length  of  the  interval,  days. 

L:  the  estimated  rate  for  the  interval. 

Y :  the  total  number  of  events  in  the  interval. 

The  length  of  the  course  is  the  last  entry  in  the  b  column. 


Academic  Attrition 


622L  Combined 

1996 

1997 

1998 

1999 

b  p  X  Y. 

20  20  0.00  0 
34  14  0.36  5 

90  56  2.62  147 

128  38  4.00  152 

141  13  2.38  31 

b  p  X  Y. 

20  20  0.00  0 

22  2  1.50  3 

34  12  0.00  0 

112  78  0.74  58 

141  29  1.59  46 

b  p  X  Y. 

50  50  0.00  0 

52  2  1.00  2 

93  41  0.12  5 

106  13  1.00  13 

141  35  0.49  17 

b  p  X  Y. 

41  41  0.00  0 

72  31  0.39  4 

135  62  0.75  47 

141  6  0.00  0 

b  p  X  Y. 

34  34  0.06  2 

44  10  1.90  19 

48  4  0.25  1 

59  11  2.36  26 

141  82  0.98  80 

619K  Combined 

1996 

1997 

1998 

1999 

b  p  X  Y. 

10  10  0.30  3 

36  26  3.35  87 

45  9  6.56  59 

86  41  1.88  77 

110  24  1.04  25 

b  p  X  Y. 

20  20  0.25  5 

46  26  1.15  30 

67  21  0.10  2 

71  4  1.25  5 

110  39  0.18  7 

b  p  X  Y. 

34  34  0.26  9 

42  8  2.25  18 

85  43  0.63  27 

96  11  0.00  0 

110  14  0.57  8 

b  p  X  Y. 

10  10  0.00  0 
28  18  1.50  27 

34  6  0.17  1 

51  17  1.41  24 

110  59  0.39  23 

b  p  X  Y. 

12  12  0.00  0 

53  41  1.15  47 

84  31  0.39  12 

91  7  0.00  0 

110  19  0.32  6 
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61 8  J  Combined 

1997 

1998 

1999 

bp  )  Y. 

26  26  0.00  0 

51  25  0.76  19 

55  4  0.00  0 

57  2  5.00  10 

81  24  1.25  30 

b  p  )  Y. 

26  26  0.00  0 

55  29  0.14  4 

57  2  2.00  4 

62  5  0.00  0 

81  19  0.37  7 

b  p  A.  Y. 

31  31  0.00  0 

55  24  0.54  13 

60  5  1.40  7 

66  6  0.00  0 

81  15  0.67  10 

bp).  Y. 

35  35  0.00  0 

56  21  0.10  2 

67  11  0.36  4 

70  3  1.67  5 

81  11  0.27  3 

6668  Combined 

6609  Combined 

b  p  )  Y. 

24  24  0.04  1 

52  28  0.61  17 

54  2  3.00  6 

56  2  0.00  0 

96  40  0.95  38 

b  p  X  Y. 

21  21  0.00  0 

34  13  0.46  6 

50  16  0.12  2 

51  1  2.00  2 

56  5  0.00  0 

Academic  Setbacks 


622L  Combined 

r  1996 

1997 

1998 

b  p  )  Y. 

13  13  2.00  26 

30  17  35.71  607 

41  11  7.18  79 

106  65  28.11  1827 

141  35  10.00  350 

bp).  Y. 

22  22  2.00  44 

29  7  22.71  159 

57  28  2.18  61 

104  47  8.66  407 

141  37  3.00  111 

b  p  X  Y. 

13  13  0.08  1 

29  16  13.81  221 

40  11  2.36  26 

107  67  10.11  677 

141  34  3.12  106 

b  p  X  Y. 

41  41  0.00  0 

72  31  0.39  12 

73  1  4.00  4 

135  62  0.76  47 

141  6  0.00  0 

1999 


b  p  k  Y 
14  14  0.14  2 

80  66  6.61  436 

98  18  3.33  6C 

99  1  28.00  28 

141  42  1.40  5$ 


619K  Combined 

1996 

1997 

1998 

1999 

bp)  Y. 

1  1  48.00  48 

8  7  0.43  3 

38  30  10.60  318 
79  41  2.56  105 

110  31  0.68  21 

bp)  Y. 

1  1  20.00  20 

6  5  0.00  0 

17  11  3.18  35 

35  18  1.44  26 

110  75  0.29  22 

b  p  )  Y. 

1  1  28.00  28 
13  2  0.33  4 

36  3  3.00  69 

59  3  0.00  0 

110  51  0.55  28 

b  p  X  Y. 

10  10  0.10  1 

16  6  6.83  41 

38  22  2.50  55 

91  53  0.58  31 
110  19  0.00  0 

b  p  X  Y. 

8  8  0.00  0 

42  34  2.85  97 

68  26  0.65  17 

70  2  3.50  7 

110  40  0.35  14 

618J  Combined 

1997 

1998 

1999 

bp)  Y. 

14  14  0.57  8 

27  13  6.92  90 

29  2  39.00  78 

54  25  10.28  257 

81  27  3.26  88 

b  p  )  Y. 

14  14  0.14  2 

25  11  1.64  18 

29  4  8.50  34 

62  33  3.15  104 

81  19  0.74  14 

bp)  Y. 

20  20  0.55  11 

27  7  3.29  23 

29  2  17.00  34 

52  23  3.48  80 

81  29  1.17  34 

bp)  Y. 

20  20  0.40  8 

27  7  3.14  22 

29  2  12.00  24 

54  25  3.44  86 

81  27  1.00  27 
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Interrupted  Instruction 


622L  Combined 

1996 

1997 

1998 

1999 

b 

P 

Y. 

b 

P 

X 

Y. 

b 

P 

k 

Y. 

b 

p 

X 

Y. 

b 

p 

X 

Y. 

7 

7 

14 

.71 

103 

7 

7 

5 

.14 

36 

13 

13 

6 

.08 

79 

27 

27 

4 

.56 

123 

85 

85 

6 

47 

550 

8 

1 

111 

.00 

111 

8 

1 

88 

.00 

88 

47 

34 

11 

.71 

398 

77 

50 

8 

.54 

427 

86 

1 

103 

.00 

103 

85 

77 

3  6 

.212 

788 

69 

61 

8 

.10 

494 

90 

43 

15 

65 

673 

78 

1 

31 

00 

31 

92 

6 

7 

.50 

45 

86 

1 

156 

.00 

156 

77 

8 

3 

.62 

29 

92 

2 

35 

00 

70 

126 

48 

8 

69 

417 

93 

1 

27 

.00 

127 

141 

55 

54 

.78  3013 

141 

64 

11. 

.97 

766 

141 

49 

14. 

.45 

708 

141 

15 

7  . 

.53 

113 

141 

48 

18, 

.62 

894 

619K  Combined 

1997 

1998 

1999 

b  p  X  Y. 

79  79  5.04  398 

80  1  30.00  30 

86  6  1.50  9 

87  1  23.00  23 

110  23  4.35  100 

b  p  X  Y. 

4  4  6.50  26 

17  13  2.08  27 

24  7  0.43  3 

38  14  3.21  45 

110  72  0.82  59 

b  p  X  Y. 

14  14  0.71  10 

15  1  8.00  8 

60  45  1.60  72 

61  1  10.00  10 

110  49  1.12  55 

b  p  X  Y. 

79  79,  1.66  131 

80  1  30,.  00  30 

86  6  0.00  0 

87  1  20.00  20 

110  23  2.78  64 

wrnmi'mnmmm 

1996 

1997 

1998 

1999 

b  p  X  Y. 

17  17  6.53  111 

35  18  11.11  200 

36  1  34.00  34 

57  21  14.90  313 

81  24  10.62  255 

b  p  X  Y. 

10  10  1.1  11 

22  12  0.0  0 

64  42  1.6  67 

79  15  0.4  6 

81  2  4.0  8 

b  p  X  Y. 

35  35  3.97  139 

36  1  26.00  26 

54  18  7.67  138 

55  1  26.00  26 

81  26  5.31  138 

bp  X  '  Y. 

18  18  0.89  16 

67  49  2.33  114 

69  2  0.00  0 

74  5  3.40  17 

81  7  0.86  6 

b  p  X  Y. 

4  4  0.50  2 

56  52  2.65  138 

57  1  8.00  8 

70  13  1.08  14 

81  11  3.55  39 

6668  Combined 

1997 

1998 

1999 

b  p  X  Y. 

6  6  1.83  11 

35  29  7.34  213 

40  5  15.40  77 

41  1  0.00  0 

96  55  12.33  678 

b  p  X  Y. 

11  11  1.09  12 

12  1  12.00  12 

82  70  2.17  152 

91  9  5.44  49 

96  5  1.80  9 

b  p  X  Y. 

13  13  0.69  9 

42  29  2.66  77 

46  4  6.75  27 

48  2  0.00  0 

96  48  4.85  233 

b  p  X  Y. 

11  11  1.55  17 

12  1  9.00  9 

20  8  0.38  3 

88  68  4.84  329 

96  8  5.12  41 

6609  Combined 

1996 

1997 

1999 

b  p  X  Y. 

1  1  0.00  0 

7  6  9.83  59 

10  3  20.67  62 

12  2  33.00  66 

56  44  7.80  343 

56  56  5.00  530 

b  p  X  Y. 

7  7  2.43  17 

9  2  20.00  40 

28  19  3.32  63 

29  1  20.00  20 

56  27  1.70  46 

b  p  X  Y. 

3  3  0.00  0 

8  5  8.40  42 

10  2  0.00  0 

12  2  25.00  50 

56  44  3.59  158 

b  p  X  Y. 

4  4  0.00  0 

40  36  2.06  74 

43  3  0.33  1 

44  1  8.00  8 

56  12  0.92  11 
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Appendix  B:  Measuring  the  Distance  Between  Two  Models 

The  class  of  models  is  the  family  of  simple  step  functions.  These  models  are 
precursors  to  smooth  curve  models  that  describe  the  mean  value  function  of  the  non- 
homogeneous  Poisson  processes  that  describe  the  attrition/setback/interruption  events 
that  occur  in  the  time  period  of  a  course.  The  distance  function  chosen  is  one  that  is 
compatible  with  this  more  encompassing  class  of  models.  The  step  functions  are  treated 
as  densities,  and  the  distance  between  two  such  functions  is  the  integral  of  the  magnitudes 
of  the  differences  separating  their  cumulative  distribution  functions. 

The  graph  below  will  illustrate  the  point.  The  two  models  are  described  by  their 
partition  points  (i.e.,  column  b  of  the  tables  in  Appendix  A).  When  k=5,  this  is  viewed  as 
a  distribution  over  five  points.  One  forms  the  cumulative  distribution  values  at  the 
epochs  of  change  and  connects  the  points  with  straight-line  segments.  The  graph  shows 
this  for  the  1996  and  1997  partition  distributions  of  course  622L  for  academic  attritions. 
The  course  length  is  141  days  and  each  model  has  a  partition  of  five  break  points.  The 
distance  between  the  two  is  the  magnitude  of  the  areas  separating  them,  measured  as  a 
percentage  of  the  area  of  the  containing  rectangle.  The  separation  of  the  two  curves 
shows  great  year-to-year  variability.  The  code  for  computing  this  distance  is  in 
Appendix  C. 


Superposition  of  Two  Five  Point  Distributions 


Oav* 


The  following  sets  of  distance  tables  provide  an  image  of  the  distances  between 
the  models  for  various  years  with  a  single  course  and  event  type.  The  column  marked 
“all”  refers  to  the  model  that  combines  the  data  for  all  of  the  years.  The  distances  in  the 
“all”  column  are  not  generally  smaller  than  the  inter-year  distances. 
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Academic  Attrition. 

Academic  Setback 

Interrupted  Instruction 

all  96  97  98 

all  96  97  98 

all  96  97  98 

0 

0 

0 

96  11.9  0 

96  3.5  0 

96  3.5  0 

97  10.0  17.0  0 

97  0.4  3.8  0 

97  7.9  11.5  0 

98  9.4  18.9  7.1  0 

98  18.6  15.5  18.7  0 

98  17.5  20.9  10.3  0 

99  18.0  13.0  16.5  19.3 

99  15.4  13.3  15.7  10.6 

99  24.1  27.7  16.2  14.8 

CDP  619K 


Academic  Attrition. 

Academic  Setback 

Interrupted  Instruction 

all  96  97  98 

all  96  97  98 

all  97  98 

0 

0 

0 

96  8.7  0 

96  12.2  0 

97  45.3  0 

97  14.5  9.9  0 

97  4.6  9.  0 

98  33.1  12.  0 

98  9.8  14.7  24.4  0 

98  5.3  17.5  8.4  0 

99  0.0  45.3  33.1  0 

99  11.5  8.8  5.6  21.3  .0 

99  13.3  23.5  14.4  11.8  0 

CDP  61 8  J 


Academic  Attrition. 

Academic  Setback 

Interrupted  Instruction 

all  97  98 

all  97  98 

all  96  97  98 

0 

0 

0 

97  2.7  0 

97  2.5  0 

96  15.1  0 

98  5.7  3.0  0 

98  2.0  4.4  0 

97  9.2  16.6  0 

99  9.6  6.9  4  0 

99  1.5  4.0  0.5  0 

98  20.5  14.9  17.  0 

99  14.8  11.1  14.0  10.1  0 
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Appendix  C:  S-Plus  Code  for  Computing  Distances  Between  cdfs 

The  first  function,  seg.compQ,  computes  the  area  of  a  polygon  marked  by  crossover 
points  of  the  two  cdf  s.  It  is  signed  by  the  order  of  the  input.  The  second  function, 
area.comp(),  collects  all  of  the  signed  area  segments  of  the  two  in  the  first  column  of  the 
output.  The  other  columns  contain  information  useful  in  more  extensive  applications. 
The  third  function,  dist.matQ,  develops  the  lower  triangular  distance  matrix  for  a 
collection  of  models,  each  column  of  the  input  matrix  is  the  set  of  partition  points  for  a 
model.  There  is  also  an  auxiliary  program,  sol.pt(). 

seg.comp 

fonction(x,  w,  uO,  yO,  nO) 

{ 

#  fname  is  seg.comp 

#  Computes  the  areas  under  the  polygonal  curves,  between 

#  two  knots,  and  returns  their  difference  (signed).  A  flag 

#  is  set  =  1  if  the  x  cdf  is  above  the  w  cdf,  and  set  =  2 

#  otherwise.  The  x  and  w  vectors  are  mono  increasing;  nO  is 

#  the  number  of  points  in  the  original  foil  sets.  The  initial  points 

#  (uO,  yO)  mark  the  beginning  of  the  segment;  the  cross-over 

#  point  (ul,yl)  is  the  segment  end  and  is  computed  internally.  A 

#  special  adjustment  is  made  if  there  are  no  crossover  points. 

ss  <-  sort(c(x,  w)) 
n  <-  length(x) 
flag  <- 1 

if(w[l]  =  ss[l]) 
flag  <-  2 

j  <-  l:n 
dx  <-  diff(x) 
dw  <-  diff(w) 

if(flag  =  1  &  sum(w[j]  >=  x[j])  =  n)  { 

areal  <-  0.5  *  (yO  +  1)  *  (x[  1  ]  -  uO)  +  dx  %*%  (j [  -  n]  +  0.5)  +  n 

#  (w[n]  -  x[n]) 

area2  <-  0.5  *  (yO  +  1 )  *  (w[  1  ]  -  uO)  +  dw  %*%  (j[  -  n]  +  0.5) 

ul  <-  x[n] 
yl  <-0 
f  <-  n} 
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else  if(flag  —  2  &  sum(w[j]  <=  x[j])  =  n)  { 

areal  <-  0.5  *  (yO  +  1)  *  (x[l]  -  uO)  +  dx  %*%  (j[  -  n] 
area2  <-  0.5  *  (yO  +  1)  *  (w[l]  -  uO)  +  dw  %*%  (j[  -  n] 
n  *  (x[n]  -  w[n]) 

ul  <-  w[n] 
yl  <-0 
f<-  n} 

else  { 

ind  <- j[x[j]  >=  w[j]] 
if(flag  =  2) 

ind  <- j[w[j]  >=  x[j]] 
f<-  ind[l] 

if(f  —  1)  { 

areal  <- 0 
area2  <-  0 
ul  <-x[l] 
yl  <-  0} 
if(f  >  1)  { 

#  make  the  end  corrections. 

PI  <-  c(x[(f-  l):f]) 

P2  <-  c(w[(f-  l):f]) 
tout  <-  sol.pt(Pl,  P2) 
ul  <-  tout[l] 
yl  <-  tout[2]  +  f  - 1 

areal  <-  ((x[l]  -  uO)  *  (1  +  y0))/2  +  ((ul  -  x[f  - 1]) 

l))/2 

area2  <-  ((w[l]  -  uO)  *  (1  +  y0))/2  +  ((ul  -  w[f  - 1]) 

D)/2} 

adj  1  <-  adj2  <-  0  #  initialize  the  adjustments  in  the  center 

if(f>=3){ 

#  adj  1  <-  (x[f  - 1]  *  (2  *  f  -  3))/2  -  (3  *  x[l])/2 

#  adj 2  <-  (w[f  -  1]  *  (2  *  f  -  3))/2  -  (3  *  w[l])/2 

#  } 

#  if(f  >  3)  { 

#  adj  1  <-  adj  1  -  sum(x[2:(f  -  2)]) 

#  adj2  <-  adj2  -  sum(w[2:(f  -  2)]) 

j  <■  2:(f- 1) 

dx  <-  diff(x[l:(f- 1)]) 
dw  <-  diff(w[  1  :(f  -  1)]) 
adjl  <-  dx  %*%  (j  -  0.5) 
adj 2  <-  dw  %*%  (j  -  0.5)} 


+  0.5) 

+  0.5)  + 


*  (yl  +  f  - 
*  (yl  +  f  - 
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yl  <-  yl  -  f  +  1 
areal  <-  areal  +adjl 
area2  <-  area2  +  adj2} 
net  <-  (areal  -  area2)/n0 
out  <-  c(net,  flag,  f,  ul,  yl) 
names(out)  <-  c("net",  "flag",  "f",  "ul",  "yl") 
out} 

area.comp 

function(x,  w,  uO  =  0,  yO  =  0) 

{#  fnarne  is  area.comp 

#  Computes  the  signed  net  areas  separating  the  empirical 

#  cdf  s  of  the  ordered  sets  x  and  w.  These  cdf  s  are  polygonal 

#  curves  which  are  connected  with  straight  line  segments.  The 

#  two  data  sets  are  of  the  same  length. 

out  <-  NULL 
nO  <-  length(x) 

JJ  <- 1 
repeat  { 

out  <-  rbind(out,  seg.comp(x,  w,  uO,  yO,  nO)) 
assign("out",  out,  frame  =  0) 
f  <-  out[jj ,  3] 
uO  <-  out[jj,  4] 
yO  <-  out[jj,  5] 
x  <-  x[-l:(l  -  f)] 
w<-  w[-l:(l  -  f)] 
n  <-  length(w) 

if(n  —  1) 

break 

jj  <- jj  +  1} 

out} 

dist.mat 

function(mat) 

{ 

#  fnarne  is  dist.mat 

#  Computes  the  distance  between  models  of  a  course 

#  for  the  several  years.  Input  is  matrix  whose  columns  are  the  fitted 

#  models.  Output  is  lower  triangular  and  in  the  percent  of  the  area 

#  of  the  rectangle. 

n  <-  ncol(mat) 
nO  <-  nrow(mat) 
dd  <-  matrix(0,  n,  n) 
for(i  in  l:(n-  1))  { 
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j  <-  i  +  1 
repeat  { 

tmp  <-  area.comp(mat[,  i],  mat[,  j]) 

dd[j,  i]  <-  sum(abs(tmp[,  1])  *  100)/mat[n0, 1] 

if(j  —  n) 

break 

j  <- j  +  1}} 

dd  <-  round(dd,  1) 
dd} 


sol.pt 

functional,  P2){ 

#  fhame  is  sol.pt 

#  finds  the  cross  over  solution  point 

#  for  two  cdfs  that  have  the  same  number 

#  of  pts  in  the  horiz  &  equi  spaced  in  the  vert. 

xl<-Pl[l] 
x2  <-  PI  [2] 
wl  <-  P2[l] 
w2  <-  P2[2] 
delx<-x2-xl 
delw  <-  w2  -  wl 

x  <-  (xl  *  w2  -  wl  *  x2)/(delw  -  delx) 
y  <-  (x  -  xl)/delx 
out  <-  c(x,  y) 
out} 
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